Overview

Dataset statistics

Number of variables15
Number of observations501
Missing cells655
Missing cells (%)8.7%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory58.8 KiB
Average record size in memory120.3 B

Variable types

NUM10
CAT4
UNSUPPORTED1

Reproduction

Analysis started2020-09-12 13:59:06.400442
Analysis finished2020-09-12 14:00:05.318149
Duration58.92 seconds
Versionpandas-profiling v2.8.0
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml

Warnings

City has a high cardinality: 497 distinct values High cardinality
Female Population is highly correlated with Population [2011]High correlation
Population [2011] is highly correlated with Female PopulationHigh correlation
Population [2011] has 6 (1.2%) missing values Missing
Popuation [2001] has 501 (100.0%) missing values Missing
Median Age has 13 (2.6%) missing values Missing
Avg Temp has 14 (2.8%) missing values Missing
SWM has 9 (1.8%) missing values Missing
Toilets Avl has 22 (4.4%) missing values Missing
Water Purity has 19 (3.8%) missing values Missing
H Index has 15 (3.0%) missing values Missing
Female Population has 15 (3.0%) missing values Missing
# of hospitals has 17 (3.4%) missing values Missing
Foreign Visitors has 17 (3.4%) missing values Missing
City is uniformly distributed Uniform
Popuation [2001] is an unsupported type, check if it needs cleaning or further analysis Unsupported

Variables

City
Categorical

HIGH CARDINALITY
UNIFORM

Distinct count497
Unique (%)99.2%
Missing0
Missing (%)0.0%
Memory size3.9 KiB
Pratapgarh
 
2
Narsinghgarh
 
2
Sumerpur
 
2
Shahpura
 
2
Warhapur
 
1
Other values (492)
492
ValueCountFrequency (%) 
Pratapgarh20.4%
 
Narsinghgarh20.4%
 
Sumerpur20.4%
 
Shahpura20.4%
 
Warhapur10.2%
 
Doiwala10.2%
 
Nandaprayag10.2%
 
Shikaripur10.2%
 
Pavagada10.2%
 
Dharchula10.2%
 
Other values (487)48797.2%
 
2020-09-12T19:30:05.872226image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Length

Max length24
Median length8
Mean length8.546906188
Min length3

State
Categorical

Distinct count29
Unique (%)5.8%
Missing0
Missing (%)0.0%
Memory size3.9 KiB
Uttar Pradesh
56
Maharashtra
51
Uttarakhand
51
Tamil Nadu
51
Rajasthan
 
48
Other values (24)
244
ValueCountFrequency (%) 
Uttar Pradesh5611.2%
 
Maharashtra5110.2%
 
Uttarakhand5110.2%
 
Tamil Nadu5110.2%
 
Rajasthan489.6%
 
Karnataka387.6%
 
Madhya Pradesh377.4%
 
Gujarat224.4%
 
Bihar224.4%
 
Kerala183.6%
 
Other values (19)10721.4%
 
2020-09-12T19:30:06.175782image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Length

Max length22
Median length10
Mean length10.04191617
Min length5

Type
Categorical

Distinct count32
Unique (%)6.4%
Missing2
Missing (%)0.4%
Memory size3.9 KiB
M
119
N.P
86
M.Cl
61
T.P
42
C.T
41
Other values (27)
150
ValueCountFrequency (%) 
M11923.8%
 
N.P8617.2%
 
M.Cl6112.2%
 
T.P428.4%
 
C.T418.2%
 
T.M.C224.4%
 
N.P.P193.8%
 
N.A193.8%
 
M.B163.2%
 
UA81.6%
 
Other values (22)6613.2%
 
2020-09-12T19:30:06.531238image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Length

Max length6
Median length3
Mean length2.932135729
Min length1

Population [2011]
Real number (ℝ≥0)

HIGH CORRELATION
MISSING

Distinct count486
Unique (%)98.2%
Missing6
Missing (%)1.2%
Infinite0
Infinite (%)0.0%
Mean24747.468686868688
Minimum110.0
Maximum36774.0
Zeros0
Zeros (%)0.0%
Memory size3.9 KiB
2020-09-12T19:30:06.739354image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Quantile statistics

Minimum110
5-th percentile7577.7
Q121435
median25199
Q330763
95-th percentile35414.3
Maximum36774
Range36664
Interquartile range (IQR)9328

Descriptive statistics

Standard deviation7813.0675
Coefficient of variation (CV)0.3157117845
Kurtosis0.7106377163
Mean24747.46869
Median Absolute Deviation (MAD)4344
Skewness-0.9115545052
Sum12249997
Variance61044023.76
2020-09-12T19:30:06.881949image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
2323430.6%
 
2345620.4%
 
2781520.4%
 
2278120.4%
 
2251620.4%
 
2164320.4%
 
630920.4%
 
2333120.4%
 
3617210.2%
 
2652110.2%
 
Other values (476)47695.0%
 
(Missing)61.2%
 
ValueCountFrequency (%) 
11010.2%
 
61210.2%
 
151710.2%
 
164110.2%
 
215210.2%
 
ValueCountFrequency (%) 
3677410.2%
 
3675410.2%
 
3673210.2%
 
3670610.2%
 
3666910.2%
 

Popuation [2001]
Unsupported

MISSING
REJECTED
UNSUPPORTED

Missing501
Missing (%)100.0%
Memory size4.0 KiB

Sex Ratio
Real number (ℝ≥0)

Distinct count145
Unique (%)29.2%
Missing5
Missing (%)1.0%
Infinite0
Infinite (%)0.0%
Mean895.508064516129
Minimum774.0
Maximum991.0
Zeros0
Zeros (%)0.0%
Memory size3.9 KiB
2020-09-12T19:30:07.100649image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Quantile statistics

Minimum774
5-th percentile839.5
Q1867.75
median890.5
Q3922
95-th percentile963
Maximum991
Range217
Interquartile range (IQR)54.25

Descriptive statistics

Standard deviation38.46415011
Coefficient of variation (CV)0.0429523213
Kurtosis-0.5558492085
Mean895.5080645
Median Absolute Deviation (MAD)27.5
Skewness0.2749158367
Sum444172
Variance1479.490844
2020-09-12T19:30:07.404275image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
869102.0%
 
910102.0%
 
86891.8%
 
87591.8%
 
87481.6%
 
87081.6%
 
91981.6%
 
87771.4%
 
87671.4%
 
86371.4%
 
Other values (135)41382.4%
 
ValueCountFrequency (%) 
77410.2%
 
82310.2%
 
82710.2%
 
83151.0%
 
83210.2%
 
ValueCountFrequency (%) 
99110.2%
 
98610.2%
 
98510.2%
 
98310.2%
 
98210.2%
 

Median Age
Real number (ℝ≥0)

MISSING

Distinct count10
Unique (%)2.0%
Missing13
Missing (%)2.6%
Infinite0
Infinite (%)0.0%
Mean26.12090163934426
Minimum23.0
Maximum32.0
Zeros0
Zeros (%)0.0%
Memory size3.9 KiB
2020-09-12T19:30:07.572539image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Quantile statistics

Minimum23
5-th percentile23
Q124
median26
Q328
95-th percentile29
Maximum32
Range9
Interquartile range (IQR)4

Descriptive statistics

Standard deviation2.145558807
Coefficient of variation (CV)0.08213953854
Kurtosis-0.8697617891
Mean26.12090164
Median Absolute Deviation (MAD)2
Skewness0.2156424531
Sum12747
Variance4.603422594
2020-09-12T19:30:07.732676image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
247515.0%
 
297414.8%
 
257114.2%
 
267014.0%
 
286813.6%
 
236412.8%
 
275410.8%
 
3051.0%
 
3251.0%
 
3120.4%
 
(Missing)132.6%
 
ValueCountFrequency (%) 
236412.8%
 
247515.0%
 
257114.2%
 
267014.0%
 
275410.8%
 
ValueCountFrequency (%) 
3251.0%
 
3120.4%
 
3051.0%
 
297414.8%
 
286813.6%
 

Avg Temp
Real number (ℝ≥0)

MISSING

Distinct count27
Unique (%)5.5%
Missing14
Missing (%)2.8%
Infinite0
Infinite (%)0.0%
Mean29.100616016427104
Minimum5.0
Maximum40.0
Zeros0
Zeros (%)0.0%
Memory size3.9 KiB
2020-09-12T19:30:07.929242image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Quantile statistics

Minimum5
5-th percentile9
Q126
median31
Q336
95-th percentile39
Maximum40
Range35
Interquartile range (IQR)10

Descriptive statistics

Standard deviation9.295787556
Coefficient of variation (CV)0.3194361092
Kurtosis0.3458727314
Mean29.10061602
Median Absolute Deviation (MAD)5
Skewness-1.139441722
Sum14172
Variance86.41166629
2020-09-12T19:30:08.138637image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
35387.6%
 
34377.4%
 
38326.4%
 
26306.0%
 
39306.0%
 
25255.0%
 
37255.0%
 
31234.6%
 
28234.6%
 
29224.4%
 
Other values (17)20240.3%
 
ValueCountFrequency (%) 
571.4%
 
630.6%
 
751.0%
 
891.8%
 
961.2%
 
ValueCountFrequency (%) 
40204.0%
 
39306.0%
 
38326.4%
 
37255.0%
 
36193.8%
 

SWM
Categorical

MISSING

Distinct count3
Unique (%)0.6%
Missing9
Missing (%)1.8%
Memory size3.9 KiB
LOW
179
HIGH
158
MEDIUM
155
ValueCountFrequency (%) 
LOW17935.7%
 
HIGH15831.5%
 
MEDIUM15530.9%
 
(Missing)91.8%
 
2020-09-12T19:30:08.414665image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Length

Max length6
Median length4
Mean length4.243512974
Min length3

Toilets Avl
Real number (ℝ≥0)

MISSING

Distinct count62
Unique (%)12.9%
Missing22
Missing (%)4.4%
Infinite0
Infinite (%)0.0%
Mean72.2776617954071
Minimum10.0
Maximum100.0
Zeros0
Zeros (%)0.0%
Memory size3.9 KiB
2020-09-12T19:30:08.617831image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Quantile statistics

Minimum10
5-th percentile18.9
Q161
median74
Q390
95-th percentile98
Maximum100
Range90
Interquartile range (IQR)29

Descriptive statistics

Standard deviation20.79900178
Coefficient of variation (CV)0.2877652826
Kurtosis1.131275088
Mean72.2776618
Median Absolute Deviation (MAD)15
Skewness-1.039213286
Sum34621
Variance432.5984749
2020-09-12T19:30:08.789331image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
69173.4%
 
97163.2%
 
92153.0%
 
94153.0%
 
96132.6%
 
71132.6%
 
50122.4%
 
95122.4%
 
80122.4%
 
57122.4%
 
Other values (52)34268.3%
 
(Missing)224.4%
 
ValueCountFrequency (%) 
1020.4%
 
1120.4%
 
1240.8%
 
1310.2%
 
1430.6%
 
ValueCountFrequency (%) 
10091.8%
 
9991.8%
 
9871.4%
 
97163.2%
 
96132.6%
 

Water Purity
Real number (ℝ≥0)

MISSING

Distinct count99
Unique (%)20.5%
Missing19
Missing (%)3.8%
Infinite0
Infinite (%)0.0%
Mean151.35892116182572
Minimum100.0
Maximum200.0
Zeros0
Zeros (%)0.0%
Memory size3.9 KiB
2020-09-12T19:30:08.975775image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Quantile statistics

Minimum100
5-th percentile105
Q1127
median152
Q3175
95-th percentile194.95
Maximum200
Range100
Interquartile range (IQR)48

Descriptive statistics

Standard deviation28.71919055
Coefficient of variation (CV)0.1897423048
Kurtosis-1.168462088
Mean151.3589212
Median Absolute Deviation (MAD)24.5
Skewness-0.09407192221
Sum72955
Variance824.7919057
2020-09-12T19:30:09.150289image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
114102.0%
 
136102.0%
 
144102.0%
 
12491.8%
 
11091.8%
 
16691.8%
 
14691.8%
 
17891.8%
 
16091.8%
 
17481.6%
 
Other values (89)39077.8%
 
(Missing)193.8%
 
ValueCountFrequency (%) 
10071.4%
 
10120.4%
 
10220.4%
 
10340.8%
 
10451.0%
 
ValueCountFrequency (%) 
20040.8%
 
19940.8%
 
19881.6%
 
19761.2%
 
19530.6%
 

H Index
Real number (ℝ≥0)

MISSING

Distinct count486
Unique (%)100.0%
Missing15
Missing (%)3.0%
Infinite0
Infinite (%)0.0%
Mean0.501041634853053
Minimum0.0009574363037994083
Maximum0.9999010902726044
Zeros0
Zeros (%)0.0%
Memory size3.9 KiB
2020-09-12T19:30:09.327148image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Quantile statistics

Minimum0.0009574363038
5-th percentile0.062014457
Q10.2666187181
median0.5082177095
Q30.737776037
95-th percentile0.944058368
Maximum0.9999010903
Range0.998943654
Interquartile range (IQR)0.4711573189

Descriptive statistics

Standard deviation0.2843004523
Coefficient of variation (CV)0.5674188182
Kurtosis-1.138861144
Mean0.5010416349
Median Absolute Deviation (MAD)0.2379787621
Skewness0.004863596619
Sum243.5062345
Variance0.08082674719
2020-09-12T19:30:09.603215image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0.886790433110.2%
 
0.174632115610.2%
 
0.318938115110.2%
 
0.40303301310.2%
 
0.241490006310.2%
 
0.744295168110.2%
 
0.0881441662310.2%
 
0.899663767110.2%
 
0.63023819310.2%
 
0.6389861710.2%
 
Other values (476)47695.0%
 
(Missing)153.0%
 
ValueCountFrequency (%) 
0.000957436303810.2%
 
0.00200120980710.2%
 
0.00380505830310.2%
 
0.00433816546910.2%
 
0.00541263366810.2%
 
ValueCountFrequency (%) 
0.999901090310.2%
 
0.99732278810.2%
 
0.996123144710.2%
 
0.995260046210.2%
 
0.991963549510.2%
 

Female Population
Real number (ℝ≥0)

HIGH CORRELATION
MISSING

Distinct count482
Unique (%)99.2%
Missing15
Missing (%)3.0%
Infinite0
Infinite (%)0.0%
Mean22542.633744855968
Minimum0.0
Maximum34523.0
Zeros1
Zeros (%)0.2%
Memory size3.9 KiB
2020-09-12T19:30:09.988835image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile7698.5
Q119449.75
median22998.5
Q327701.75
95-th percentile31957.5
Maximum34523
Range34523
Interquartile range (IQR)8252

Descriptive statistics

Standard deviation6931.232314
Coefficient of variation (CV)0.3074721611
Kurtosis1.044417715
Mean22542.63374
Median Absolute Deviation (MAD)3994.5
Skewness-0.9312305215
Sum10955720
Variance48041981.4
2020-09-12T19:30:10.238542image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
1900420.4%
 
2185720.4%
 
1931620.4%
 
2736220.4%
 
2760010.2%
 
538210.2%
 
1952910.2%
 
887810.2%
 
1933610.2%
 
3022110.2%
 
Other values (472)47294.2%
 
(Missing)153.0%
 
ValueCountFrequency (%) 
010.2%
 
9410.2%
 
52210.2%
 
129210.2%
 
139210.2%
 
ValueCountFrequency (%) 
3452310.2%
 
3436010.2%
 
3432810.2%
 
3423710.2%
 
3411410.2%
 

# of hospitals
Real number (ℝ≥0)

MISSING

Distinct count27
Unique (%)5.6%
Missing17
Missing (%)3.4%
Infinite0
Infinite (%)0.0%
Mean19.173553719008265
Minimum3.0
Maximum30.0
Zeros0
Zeros (%)0.0%
Memory size3.9 KiB
2020-09-12T19:30:10.428036image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Quantile statistics

Minimum3
5-th percentile7.15
Q114
median20
Q325
95-th percentile29
Maximum30
Range27
Interquartile range (IQR)11

Descriptive statistics

Standard deviation6.69714897
Coefficient of variation (CV)0.3492909592
Kurtosis-0.6789259198
Mean19.17355372
Median Absolute Deviation (MAD)5
Skewness-0.3042408893
Sum9280
Variance44.85180432
2020-09-12T19:30:10.589156image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
21336.6%
 
28295.8%
 
12265.2%
 
20255.0%
 
23244.8%
 
19244.8%
 
15234.6%
 
24224.4%
 
11214.2%
 
26214.2%
 
Other values (17)23647.1%
 
ValueCountFrequency (%) 
330.6%
 
491.8%
 
561.2%
 
651.0%
 
720.4%
 
ValueCountFrequency (%) 
30163.2%
 
29193.8%
 
28295.8%
 
27173.4%
 
26214.2%
 

Foreign Visitors
Real number (ℝ≥0)

MISSING

Distinct count28
Unique (%)5.8%
Missing17
Missing (%)3.4%
Infinite0
Infinite (%)0.0%
Mean1676300.902892562
Minimum798.0
Maximum4684707.0
Zeros0
Zeros (%)0.0%
Memory size3.9 KiB
2020-09-12T19:30:10.773662image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Quantile statistics

Minimum798
5-th percentile34886
Q1284973
median923737
Q33104060
95-th percentile4684707
Maximum4684707
Range4683909
Interquartile range (IQR)2819087

Descriptive statistics

Standard deviation1704860.432
Coefficient of variation (CV)1.017037233
Kurtosis-1.041338164
Mean1676300.903
Median Absolute Deviation (MAD)755952
Skewness0.7625311928
Sum811329637
Variance2.906549094e+12
2020-09-12T19:30:10.944898image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
31040605611.2%
 
4684707499.8%
 
4408916499.8%
 
1475311479.4%
 
105882459.0%
 
636502377.4%
 
421365357.0%
 
923737224.4%
 
284973224.4%
 
977479183.6%
 
Other values (18)10420.8%
 
(Missing)173.4%
 
ValueCountFrequency (%) 
79810.2%
 
179710.2%
 
276930.6%
 
326020.4%
 
570520.4%
 
ValueCountFrequency (%) 
4684707499.8%
 
4408916499.8%
 
31040605611.2%
 
1489500142.8%
 
1475311479.4%
 

Interactions

2020-09-12T19:29:32.918938image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T19:29:34.012459image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T19:29:34.215431image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T19:29:34.411514image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T19:29:34.650756image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T19:29:34.861120image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T19:29:35.079853image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T19:29:35.282929image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T19:29:35.532872image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T19:29:35.751533image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T19:29:35.985855image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T19:29:36.423253image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T19:29:36.657578image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T19:29:36.829411image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T19:29:37.032485image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T19:29:37.235563image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T19:29:37.423057image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T19:29:37.594882image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T19:29:37.803066image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T19:29:37.990521image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T19:29:38.182773image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T19:29:38.370263image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T19:29:38.573314image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T19:29:38.838871image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T19:29:39.073107image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T19:29:39.274616image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T19:29:39.462104image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T19:29:39.649529image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T19:29:39.892596image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T19:29:40.143126image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T19:29:40.327900image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T19:29:40.546606image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T19:29:40.765298image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T19:29:40.952753image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T19:29:41.171455image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T19:29:41.409899image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T19:29:41.740774image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T19:29:42.210150image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T19:29:42.663166image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T19:29:43.053702image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T19:29:43.260059image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T19:29:43.463104image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T19:29:43.650555image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T19:29:43.822393image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T19:29:44.025471image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T19:29:44.244169image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T19:29:44.478488image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T19:29:44.681566image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T19:29:44.884676image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T19:29:45.056479image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T19:29:45.243937image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T19:29:45.447012image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T19:29:45.665751image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T19:29:45.853170image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T19:29:46.056284image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T19:29:46.228116image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T19:29:46.415572image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T19:29:46.613733image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T19:29:46.816812image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T19:29:46.988652image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T19:29:47.176134image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T19:29:47.504150image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T19:29:47.769713image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T19:29:47.957203image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T19:29:48.160247image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T19:29:48.332113image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T19:29:48.519574image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T19:29:48.707030image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T19:29:48.927455image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T19:29:49.129705image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T19:29:49.342277image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T19:29:49.570303image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T19:29:49.844331image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T19:29:50.103592image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T19:29:50.464368image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T19:29:50.675804image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T19:29:50.891227image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T19:29:51.162767image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T19:29:51.450272image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T19:29:51.658818image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T19:29:51.846275image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T19:29:52.049351image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T19:29:52.236811image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T19:29:52.416149image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T19:29:52.619265image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T19:29:52.822305image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T19:29:53.150356image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T19:29:53.462782image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T19:29:53.665859image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T19:29:53.837723image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T19:29:54.037482image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T19:29:54.231339image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T19:29:54.403147image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T19:29:54.590629image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T19:29:54.780990image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T19:29:54.968482image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T19:29:55.171519image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T19:29:55.408332image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T19:29:55.627031image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T19:29:55.814519image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Correlations

2020-09-12T19:30:11.391816image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2020-09-12T19:30:12.146775image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2020-09-12T19:30:12.537285image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2020-09-12T19:30:12.906758image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.
2020-09-12T19:30:13.219184image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

2020-09-12T19:29:59.091873image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T19:30:00.071815image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T19:30:04.396294image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T19:30:05.068145image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Sample

First rows

CityStateTypePopulation [2011]Popuation [2001]Sex RatioMedian AgeAvg TempSWMToilets AvlWater PurityH IndexFemale Population# of hospitalsForeign Visitors
0TuensangNagalandT.C36774.0NaN931.023.010.0MEDIUM94.0114.00.25339034237.017.02769.0
1LakshmeshwarKarnatakaT.M.C36754.0NaN934.025.038.0HIGH62.0160.00.19255534328.013.0636502.0
2ZiraPunjabM.Cl.36732.0NaN883.029.035.0HIGH63.0105.00.88788232434.017.0242367.0
3YawalMaharashtraM.Cl36706.0NaN887.026.031.0HIGH60.0174.00.40783832558.011.04408916.0
4Thana BhawanUttar PradeshN.P.36669.0NaN877.028.039.0LOW92.0153.00.32445632159.023.03104060.0
5RamdurgKarnatakaUA36649.0NaN942.027.028.0MEDIUM92.0185.00.57188334523.030.0636502.0
6PulgaonMaharashtraM.Cl36522.0NaN887.026.031.0MEDIUM72.0108.00.27119532395.011.04408916.0
7SadasivpetTelanganaM36334.0NaN921.027.040.0LOW70.0116.00.49422733464.017.0126078.0
8NargundKarnatakaT.M.C36291.0NaN940.023.037.0LOW77.0148.00.70856234114.021.0636502.0
9Neem-Ka-ThanaRajasthanM36231.0NaN850.025.025.0MEDIUM61.0148.00.59232530796.029.01475311.0

Last rows

CityStateTypePopulation [2011]Popuation [2001]Sex RatioMedian AgeAvg TempSWMToilets AvlWater PurityH IndexFemale Population# of hospitalsForeign Visitors
491BhaiseenaRajasthanG.P3200.0NaN869.024.034.0LOW17.0167.00.0929572781.04.01475311.0
492DwarahatUttarakhandN.P2749.0NaN836.025.012.0HIGH18.0146.00.1867392298.08.0105882.0
493BadrinathUttarakhandN.P2438.0NaN848.029.012.0LOW19.0190.00.4329912067.04.0105882.0
494DogaddaUttarakhandN.P.P2422.0NaN840.026.011.0HIGH11.0146.00.0304212034.04.0105882.0
495DevprayagUttarakhandN.P2152.0NaN840.029.07.0MEDIUM14.0124.00.5030701808.08.0105882.0
496NandaprayagUttarakhandN.P1641.0NaN848.027.07.0MEDIUM12.0181.00.3169261392.04.0105882.0
497KirtinagarUttarakhandN.P1517.0NaN852.028.012.0HIGH16.0198.00.3368521292.06.0105882.0
498KedarnathUttarakhandN.P612.0NaN853.024.09.0LOW19.0189.00.723253522.06.0105882.0
499GangotriUttarakhandN.P110.0NaN852.027.08.0MEDIUM18.0170.00.42106194.08.0105882.0
500KumarganjUttar PradeshC.TNaNNaN863.024.035.0HIGH19.0149.00.1543750.06.03104060.0